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Abstract 

In the theory of compressed sensing, restricted isometry analysis has become a stan- 
dard tool for studying how efficiently a measurement matrix acquires information about 
sparse and compressible signals. Many recovery algorithms arc known to succeed when 
the restricted isometry constants of the sampling matrix are small. Many potential 
applications of compressed sensing involve a data-acquisition process that proceeds by 
convolution with a random pulse followed by (nonrandom) subsampling. At present, 
the theoretical analysis of this measurement technique is lacking. This paper demon- 
strates that the sth order restricted isometry constant is small when the number m 
of samples satisfies m > (slogn) 3 / 2 , where n is the length of the pulse. This bound 
improves on previous estimates, which exhibit quadratic scaling. 

1 Introduction 

The theory of compressed sensing [7l[^[TT | [T6 l [T8 | [38] predicts that a small number of linear 
samples suffice to capture all the information in a sparse vector and that, furthermore, we 
can recover the sparse vector from these samples using efficient algorithms. This discovery 
has a number of potential applications in signal processing, as well as other areas of science 
and technology. 

The linear data acquisition process is described by a measurement matrix. The restricted 
isometry property (RIP) [TOlllIlCEHlEH] is a standard tool for studying how efficiently this 
matrix captures information about sparse signals. The RIP also streamlines the analysis 
of signal reconstruction algorithms. It is unknown whether any deterministic measurement 
matrix satisfies the RIP with the optimal scaling behavior. See, e.g., the discussion in |38|. 
Sec. 2.5] or [181 Sec. 5.1]. In contrast, a variety of random measurement matrices exhibit 
the RIP with optimal scaling, including Gaussian matrices and Rademacher matrices [3] 
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Although Gaussian random matrices are optimal for sparse recovery, they have limited 
use in practice because many measurement technologies impose structure on the matrix. 
Furthermore, recovery algorithms tend to be more efficient when the matrix admits a fast 
matrix-vector multiply. For example, random sets of rows from a Fourier transform matrix 
model the measurement process in MRI imaging, and these partial Fourier matrices lead 
to fast recovery algorithms because they can be applied using the FFT. It is known that a 
partial Fourier matrix satisfies a near-optimal RIP [11^136 . 38, 44j: the paper [38| contains 
some generalizations, and we refer to 00] for a variation related to recovery of sparse 
Legendre expansions. 

Many potential applications of compressed sensing involve sampling processes that can be 
modeled by convolution with a random pulse. This measurement process can be modeled 
using a random circulant matrix. When we retain only a limited number of samples from 
the output of the convolution, the measurement process is described by a partial random 
circulant matrix. This situation has been studied in several works from the compressed 
sensing literature, including [2ll25 P37ll43| [48]. So far, the best available analysis of a partial 
random circulant matrix suggests that its restricted isometry constants do not exhibit 
optimal scaling. This work describes a new analysis that dramatically improves the previous 
estimates. Nevertheless, our results still fall short of the optimal scaling that one might 
hope for. 

1.1 Compressed Sensing 

The compressed sensing problem considers how to recover a vector x = (x\, . . . , x n ) T € W 1 
from the linear image 

y = Ax, 

where the matrix A £ jj m ><™ anc j m n. Clearly, it is impossible to reconstruct the 
vector x without additional prior information. Compressed sensing introduces the extra 
assumption that x is s-sparse, i.e., ||ac||o '■= #{£ : xg ^ 0} < s for some s <C n. More 
generally, we assume that x is well-approximated by a sparse vector. 

The naive approach of reconstructing x by solving the ^o- m i n i m i za ti° n problem, 

min||z||o subject to y = Ax, 

z 

is NP-hard |32j . Therefore, several tractable heuristics have been proposed in the literature 
as alternatives to ^-minimization, most notably greedy algorithms [4,20,33,34,46 and 
.^-minimization [9|ll3tfl6]. The latter approach consists in solving the convex program 

min||z||x subject to y = Ax, (1-1) 

z 

where || • \\ p denotes the usual £ p vector norm. 

The restricted isometry property (RIP) offers a very elegant way to analyze l\ -minimization 
and greedy algorithms. Define the restricted isometry constant S s of an m x n matrix A to 
be the smallest positive number that satisfies 

(1 - £ s )||a5||! < \\Ax\\l < (1 + S s )\\x\\l for all x with ||x|| < s. (1.2) 
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In words, the statement (|1.2p requires that all column submatrices of A with at most s 
columns are well-conditioned. Informally, A is said to satisfy the RIP (with order s) when 
8 S is small (for s close to m). 

A number of recovery algorithms are provably effective for sparse recovery if the matrix A 
satisfies the RIP. More precisely, suppose that the matrix A obeys (jl.2p with 

6 KS < 8* (1.3) 

for suitable constants k > 1 and 8* < 1. Then these algorithms precisely recover all s-sparse 
vectors x from the measurements y = Ax. More generally, when the vector x is arbitrary 
and we acquire noisy observations 

y = Ax + e where ||e||2 < t, 

these algorithms return a reconstruction x that satisfies an error bound of the form 

\\x-xh < C!^i + C 2 r, (1.4) 

where a s (x)i = inf|| z || 0<s \\x — z\\i denotes the error of best s-term approximation in l\ 
and C\,Ci are positive constants. Table Q] lists the best values available for the constants 
k and 8* for several algorithms along with appropriate references. 
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Table 1: Values of the constants k and 5* in (|1.3f) for various recovery algorithms. 



A Gaussian random matrix A £ jj mxn j g a ma ^ r i x that has independent, normally dis- 
tributed entries with mean zero and variance one. It is shown, e.g., in [3l lllll3Tj that the 
restricted isometry constants of satisfy 5 S < 8 with high probability provided that 

m > C8~ 2 s log(n/s). 

It follows that the number m of Gaussian measurements required to reconstruct an s- 
sparse signal of length n is linear in the sparsity and logarithmic in the ambient dimension. 
See [3j ITTJ [181 [3T1 [38] for precise statements and extensions to Bernoulli and subgaussian 
matrices. It follows from lower estimates of Gelfand widths that this bound on the required 
samples is optimal [Hl[22l|23] > that is, the log-factor must be present. 

For a matrix consisting of m random rows from annxn discrete Fourier transform matrix, 
slightly weaker estimates are available [TlJ[36j[38llll] • The restricted isometry constants of 
this matrix satisfy 8 S < 8 with high probability provided that 

m > C8~ 2 s log 3 (s) log(n). 
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1.2 Partial Random Circulant Matrices 



Given a vector 



. . . , 4> n -\) T G M n , we introduce the circulant matrix 



^0 9n-l 
h 4>0 



>n-\ 



hi-2 



(1.5) 



Square matrices are not very interesting for compressed sensing, so we will restrict our 
attention to a row submatrix of Consider an arbitrary index set C {0, 1, . . . , n — 1} 
whose cardinality |f2| = m. We define the operator Rq G I mx " that restricts a vector 
to the entries listed in f2. Then the corresponding partial circulant matrix generated with 
(f> G M. n is defined as 



Rn<*> c 



(1.6) 



The action of can be interpreted as a circular convolution with the sequence -^=0 followed 
by a subsampling at locations indexed by 0. 

We will demonstrate that a partial circulant matrix with a random generator has small 
restricted isometry constants. As a result, we can recover a sparse vector x robustly from 
measurements y = $>x using any of the algorithms mentioned above. Since 3>° can be 
diagonalized via the Fourier transform, the matrices 3> and <&* both admit fast matrix- 
vector multiplication using the FFT algorithm. This fact allows us to accelerate recovery 
algorithms substantially. 

A Rademacher sequence e = (ei, . . . ,e n ) T is a sequence of independent random variables, 
each taking the values +1 and —1 with equal probability. In the sequel, the matrix 3> in 
(jl.6p will always be generated by a Rademacher sequence cf> = e, and we will refer to it as 
a partial random circulant matrix. 



The main result of this paper is the following theorem. 



Theorem 1.1 Let £1 be an arbitrary subset o/{0, 1, . . . , n— 1} with cardinality = m. Let 
$ be the corresponding partial random circulant matrix (jl.6p generated by a Rademacher 
sequence, and let 5 S denote the sth restricted isometry constant. Then 

Efo] < Ci max [ — log 3/2 n, ./^logslognl (1.7) 
m V m 

where C\ > is a universal constant. 

In particular, (|1.7|) implies that for given 5 G (0, 1), we have E[<5 S ] < 5 provided 

m > C 2 m&x{5~ 1 s 3/2 log 3/2 n, 5~ 2 s log 2 n log 2 sj , (1.8) 
where C2 > is another universal constant. 

Theorem 11.11 tells us that partial random circulant matrices $ obey (|1.2j) in expectation. 
The following theorem states that the random variable 5 S does not deviate much from its 
mean. 
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Theorem 1.2 Let S s be as in Theorem \l.l\ Then for < A < 1 

P {5 S > E[S S ] + A) < e~ A2/a2 where a 2 = C 3 — log 2 s log 2 n, 

m 

for a universal constant C3 > 0. 

The proof of Theorem 11.11 is connected with the approach for random partial Fourier ma- 
trices [38^144] . We use a version of the classical Dudley inequality for Rademacher chaos 
that bounds the expectation of its supremum by the maximum of two entropy integrals 
that involve covering numbers with respect to two different metrics. We use elementary 
ideas from Fourier analysis to provide bounds for these metrics. This reduction allows us to 
exploit covering number estimates from the RIP analysis for partial Fourier matrices [38044) 
to complete the argument. 

1.3 Discussion 

In essence, the bound (jl.8p exhibits the scaling behavior m > s 3 / 2 log 3 / 2 n. This result 
improves on the best available result for this type of matrix [25], but it falls short of the 
linear scaling in s that is typical in the compressive sensing literature. The bottleneck in our 
argument appears to be the bound on the "subexponential integral" (Section l2.4p . It is not 
clear how to significantly improve (|2.12j) or the covering numbers from Proposition 12.31 so 
tightening this bound tom>s log p n for some constant p will probably require a different 
approach. Indeed, it is known that the central tool in this paper, the Dudley-type inequality 
for Rademacher chaos stated in Proposition 12.21 is not sharp for all examples [29|.I45J . It 
might be that we are facing one of these cases. 

The statement of Theorem 1 1 . 1 1 uses a very specific model for the measurement matrix based 
on a partial random convolution with a generator cf> given by a Rademacher sequence. We 
have restricted our discussion to this example to simplify the exposition. Analogous results 
for other types of random generator sequences can be derived using the same type of 
analysis. In particular, one might consider the following variations. 

Gaussian generating sequence. We can take the sequence <fi to be iid Gaussian with 
zero mean and variance one. In this case, we can establish (jl.8p by repeating the same 
steps because the central tool, Proposition 12.21 holds for Gaussian chaos processes as 
well as Rademacher chaos processes (perhaps with different constants). It is possible 
that, for this case, there is an extra factor of logn in the denominator of the variance 
a 2 in the tail bound in Theorem ll.2i 

Fourier-domain randomness. The generating sequence can also be iid Bernoulli in the 
Fourier domain. That is, we can take <f) = y/nF~ 1 e, where F is the Fourier matrix 
(see below) and e is a Rademacher sequence. The analysis in this case is almost 
identical, except that we take the Fourier-domain expression (|2.5p for the random 
process as our starting point. 

This type of model was analyzed in }43j for the case where also SI is chosen randomly; 
Theorem 11.11 gives us a result when Q is arbitrary. Our model is also related to 
the random demodulator system analyzed in [U] . If we switch the roles of time and 
frequency, we can interpret the measurement system $ as taking a signal that is 
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sparse in the Fourier domain, multiplying it pointwise by a Rademacher sequence 
in the time domain (the random demodulation), and then recording the frequency 
components indexed by the set 0. If consists of a sequence of consecutive indices, 
then this operation is equivalent with random demodulation followed by bandpass 
filtering and, finally, acquiring m uniformly spaced samples. (In our model, we are 
observing the Fourier transform of the samples rather than the samples themselves.) 
This observation broadens the "randomly demodulate, integrate, then subsample" 
architecture of [U] to "randomly demodulate, bandpass filter, then subsample" . 

Complex generating sequences. It is also possible to take the generating sequence to be 
either a complex Steinhaus or complex Gaussian sequence. The proofs above remain 
essentially the same, the main difference would be establishing complex versions of 
Proposition 12.21 and Theorem 13.11 (some related results for Steinhaus sequences can 



Toeplitz matrices. We can obtain analogous results for sections of a random Toeplitz 
matrix because a Toeplitz matrix can be embedded in a circulant matrix of twice the 
dimension. 

Applications. From an engineering point of view, Theorems 11.11 and 11.21 tell us that we 
can identify a system with a sparse impulse response by probing it with a random input 
sequence and then taking a small number of samples of the output. This type of system 
identification, or deconvolution, problem is common task in signal processing, and the fact 
that it can be performed from a small number of samples allows for some interesting new 
design considerations. 

For example, in radar imaging a transmitter sends out a pulse, which reflects off of a 
number of targets, and then a receiver observes this superposition of pulses (which can 
be modeled as the original pulse convolved with an unknown sparse range profile). The 
resolution to which we can resolve target locations is determined by the bandwidth of 
the pulse; to reconstruct the range profile digitally, then, the system requires an analog- 
to-digital converter (ADC) whose sample rate is on the same order as this bandwidth. 
Typical pulse bandwidths are in the gigahertz range, and ADCs that operate at this rate 
are expensive and low resolution. Indeed, lack of good high-speed ADCs has "historically 
slowed the introduction of digital techniques into radar signal processing" [41} Chap. 1.2]. 
Theorem 11.11 suggests that sample rate of the ADC depends primarily on the sparsity of 
the range profile, rather than the bandwidth of the pulse, which would allow us to achieve 
the same resolution with less expensive and more accurate hardware. See |26p43| for more 
discussion of how convolution with a random waveform followed by subsampling can be 
applied to active imaging problems. 

Another application of sparse recovery from a random convolution is increasing the field- 
of-view of a camera using a coded aperture [24,30]. Here, we can imagine an optical 
architecture where a large image is convolved with a random code and then a small spatial 
portion of the result is sampled on a compact pixel array. If the image is sparse enough, 
Theorem 11.11 suggests that the entire field-of-view can be reconstructed to full resolution 
from this small set of observations. 

Dimensionality reduction. The Johnson-Lindenstrauss lemma is an important tool for 
dimensionality reduction. It establishes that the pairwise distances between points in a 



be found in 




Ch. 4]). 
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high-dimensional space are approximately preserved after we project the points into a sig- 
nificantly lower-dimensional space using a random linear map. While Gaussian or Bernoulli 
matrices were initially used for this task, more recent analyses show that structured ran- 
dom matrices also work. In particular, Hinrichs and Vybfral [27] have shown that one 
can perform dimension reduction using partial random circulant matrices with randomized 
column signs. These matrices are computationally efficient because they can be applied 
using the FFT algorithm. Krahmer and Ward subsequently showed that a matrix satis- 
fying the RIP provides a Johnson-Lindenstrauss embedding if one randomizes the column 
signs [28]. Together with our result on the RIP of partial random circulant matrices, the 
work of Krahmer and Ward improves on a related result by Vybfral [19]. See [28] for a 
precise statement. 

1.4 Relationship with previous work 

Numerical results for compressive sampling by random convolution followed by subsampling 
appear in [48] . In this paper, the measurement process convolves with a pulse of length 
B and then extracts m equally spaced samples from the convolution. The effectiveness of 
this strategy is quantified empirically as a function of pulse length and the undersampling 
ratio: for long enough pulses, the number of samples required to reconstruct a signal is 
approximately linear in the sparsity. 

Later, theoretical results for compressed sensing using random convolution were developed 
in [42] 143], In these works, the measurement model is slightly different; the generating 
sequence (f> is t ne discrete Fourier transform of an iid sequence of random signs. Convolu- 
tion with this spectrally random sequence is followed by sampling at random locations (as 
opposed to the arbitrary set we are considering in this work). This process is univer- 
sally efficient, in that an s-sparse signal can be reconstructed from m > slogan samples 
independent of the orthobasis in which it is sparse. 

The first theoretical results for the measurement model we are using in this paper, convolu- 
tion with a iid sequence followed by subsampling at fixed locations, can be traced to [2]I25], 
They show that the matrix <1> define in (II. 6p has the RIP of order s with high probability 
when m > s 2 log n. These works are couched in the language of channel estimation, and 
so the results are stated explicitly for the case where Vt contains consecutive indices. Nev- 
ertheless, it appears that the same proof strategy extends to arbitrary O. Theorems 11.11 
and 11.21 refine the sufficient condition for this model to m > s 3 / 2 log n using a completely 
different mathematical analysis. 

A nonuniform recovery result for partial random circulant matrices has been established 
in |37]I38], (See |38| for a discussion of the difference between nonuniform and uniform 
recovery guarantees.) Suppose that the number m of measurements satisfies m > slog 2 n. 
Let xq be an s-sparse vector xq whose nonzero components have random signs. With high 
probability, we can recover this vector exactly via i\ -minimization using the measurements 
y = $>Xo, where the partial random circulant matrix is drawn independently from Xq. 
The proof involves duality for convex optimization, and it does not establish any type 
of RIP. As a consequence, this work does not offer any guarantees about stability in the 
presence of measurement noise or robustness when the signal Xq is not exactly s-sparse, in 
contrast with the RIP recovery bound (jl.4p . 
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2 Proof of Theorem 11.11 (Expectation) 



We develop a method for estimating the restricted isometry constant 5 S for a fixed sparsity 
level s. Let T denote the set of all s-sparse signals in the Euclidean unit ball: 

T := {x € M n : ||a:||o < 8, \\x\\ 2 2 < 1}. (2.1) 

Define a function |||-||| on Hermitian n x n matrices via the formula 

I A I := sup \x* Ax\ . 

x£T 

This function can be extended to a norm on the set of all square matrices. We work with 
the quantity |*** — I||| because 

I*** -I||| = sup|((***-I)a;,x)| = sup | ||*£c||l - \\x\\l I = 5 S . (2.2) 

x&T x&T 



Let S be the cyclic shift down operator on column vectors in W 1 . Applying the power S k 
to x cycles x downward by k coordinates: (S k x)e = xgQ k , where © is subtraction modulo 
n. Note that (S k )* = S~ k = S n . We can now express * as a random sum of shift 
operators, 



1 n 

* = ^Y j e k R n S k . 

. I'm — » 



m 

k=l 



It follows that 



where Pq = R^Rn is the nxn diagonal projector onto the coordinates in Q. Applying Pq 
to x preserves the values of x on the set while setting the values outside of to zero. 

Combining (|2.2|) and (|2.3p . we can view the restricted isometry constant as the supremum 
of a random process indexed by the set T: 

6 S = sup|GU where G x = —y^e k etx*S~ k PnS t x (2.4) 
We must bound the expected supremum of this process. 



2.1 Fourier representation of the random process 

One of the key ideas in this work is to re-express the random process G x in the Fourier 
domain. Let F be the nxn discrete Fourier transform matrix whose entries are given by 
the expression 

F(oj,£) := e - i2W/n , 0<cj,£<n-l. 

Note that we employ the electrical engineering convention that F is unnormalized. The 
hat symbol indicates the Fourier transform of a vector: x := Fx. Recall that a shift in the 
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time domain followed by a Fourier transform can also be written as a Fourier transform 
followed by a frequency modulation: 

FS k = M k F, 

where M is the diagonal matrix with entries M(oj,oj) := e ~ l2nul / n for < uj < n — 1. 
The random process G x has the Fourier-domain representation 

G x = -ye k e e x*M- k P Q M e x, (2.5) 

where Pq = u~ 1 FPqF~ 1 . The matrix P^ has several nice properties that we use in the 
sequel. 

Lemma 2.1 The n x n matrix Pq = n~ 1 FP^F~ 1 has the following properties: 

1. Pq is circulant and conjugate symmetric. 

2. Along the diagonal Pq(uj,uj) = m/n 2 , and off the diagonal \Pq(uj,^)\ < m/n 2 . 

3. Since the rows and columns of P^i are circular shifts of one another, 

V \Pn(u,Z) 2 = T\Pn(u,H) 2 = \\Pn\\ 2 F /n = m/n 3 . 

4- Pn has exactly m nonzero eigenvalues, each of which is equal to 1/n. As such, Pq 
has spectral norm \\Pn\\ = 1/n and Frobenius norm \\Pq\\p = m/n 2 . 

Proof These properties follow almost immediately from the fact that Pn = RqRq is a 
diagonal matrix with 0-1 entries. The matrix Pq inherits conjugate symmetry from i-fo. 
The matrix Pa is circulant because it is diagonalized by the Fourier transform. Since we 
form P^i by applying a similarity transform to Pq, they have the same eigenvalues modulo 
the scale factor n . The remaining points follow from the simple calculations described in 
the statement of the lemma. ■ 



2.2 Integrability of chaos processes 

For the next step in the argument, we must rewrite the random process (j2.5j) again. Let 
e = [eo> • • • 5 £n-i]*- The process can now be expressed as a quadratic form: 

G x = (e, Z x e) where x G T. (2.6) 

The matrix Z x has entries 



Z x (k, £) 



m~ 1 x*M~ k P n M i x, k^£ 
0, k = i 
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A short calculation verifies that this matrix can be written compactly. 

Z x = — (F*X*P n XF - di&g(F*X*P a XF)) , (2.7) 

m \ / 

where X := diag(i) is the diagonal matrix constructed from the vector x. The term 
homogeneous second- order chaos is used to refer to a random process G x of the form (|2.6p 
where each matrix Z x is conjugate symmetric and hollow, i.e., has zeros on the diagonal. 

To bound the expected supremum of the random process G x over the set T, we apply a 
version of Dudley's inequality that is specialized to this setting. Define two pseudo-metrics 
on the index set T: 

di(x,y) := \\Z X -Z y \\ and d 2 (x,y) := \\Z x -Z y \\ F . 

Let N(T,di,u) denote the minimum number of balls of radius u in the metric d{ that we 
need to cover the set T. 

Proposition 2.2 (Dudley's inequality for chaos) Suppose that G x is a homogeneous 
second-order chaos process indexed by a set T. Fix a point xo £ T . There exists a universal 
constant K such that 

f roo POO ~\ 

Esup\G x -G X0 \ < Kmax log N(T, d\ , u) du, / y / logN(T,d 2 ,u) du \ . (2.8) 

xeT [Jo Jo J 

Proposition 12.21 is based on the idea that the random process has a subexponential part, 
whose variation is controlled by the integral with respect to d\, and a subgaussian part, 
whose variation is controlled by the integral with respect to d 2 - This result appears in [291 
Thm. 11.22] and [451 Thm. 2.5.2]. Our statement of the proposition looks different from 
the versions presented in the literature, so we sketch the derivation in Appendix lAl 



2.3 The subgaussian integral 

In this section, we develop an estimate for the second integral in (12.8j) . To do so, we need 
a simpler bound for the metric d 2 - First, note that 

d 2 (x,y) < -\\F*(X*P n X -Y*P Q Y)F\\ F = -\\X* P n X - Y* P n Y\\ F 

m m 

= — ||(X - Y)*P Q (X + Y) + (X + Y)*P n (X - Y)\\ F 

= i[Ej^(^)l 2 -\Q x A^O\ 2 } 1/2 

where we have written 

Q x , y (oj,o ■.= (^)+ y (u))(x(o-mr + (^)-yn)(m+mr- 

The first inequality arises when we re-introduce the diagonal entries of the hollow matrices. 
The next identity follows from the unitary invariance of the Frobenius norm. The second 
line is the polarization identity, and we obtain the last line by expressing the Frobenius 
norm in terms of coordinates. 
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Define the || • \\& norm to be the norm in the discrete Fourier domain 

Halloo • — Halloo* 

We can bound the entries of Q x ,y in terms of this norm. If we abbreviate v = x + y, 

\Qx, y (u,0\ < \\x-y\\oo- (\v(^)\ + \v(0\). 

Introduce the latter bound into our estimate for the metric and apply the triangle inequality 
to reach 

d2 ( x ,y) < nllx ~J lu [(E Ui€ i^(-,e)i 2 i*Mi 2 ) 1/2 + \ p ^o\ 2 i^)i 2 ) 1/2 ' 

Let us examine the first sum more closely. 

(zjp^.mt = QiHt) 112 < y| ( pii 2+ ii«ii 2 ) = ^. 

The first identity follows from Point 3 of Lemma [2.1i The second relation is the the triangle 
inequality. The last identity is a consequence of the fact that x and y have unit energy 
together with Parseval's identity. An analogous argument applies to the second sum. We 
conclude that 

, . . 2\\x — yiioo nr i . 

d 2 {x,y) < -j= = 2J -= \\x - y\\ & . 

Jm V rn J s 



This bound on d 2 allows us to estimate the subgaussian integral in terms of the covering 
numbers of T with respect to the norm s _1//2 || • ||ob. Abbreviating a = 2-^/s/m, we compute 
that 

h ■= / y/log N(T, d 2 ,u) du < / JlogN^as-^W-WA^du 
Jo Jo 

= J yJlogNiT,*- 1 ^ • ||ob,a-%) du = a ^JlogN(T, s^W ■ \\ A ,u) du 
= ^logiV(T, s-WW-W^u) du. 

The first inequality uses the fact that the metric balls in d 2 are larger than the balls in the 
norm as -1 / 2 || • ||ob because the metric is smaller than the norm. The second line follows 
from an elementary scaling property of covering numbers along with a change of variables 
in the integral. Finally, we apply the fact that T is contained in the unit ball of s _1 / 2 || • 
to see that the integrand vanishes for u > 1. We can now exploit some covering number 
estimates that appear in the literature |38}l44j. The first bound follows from a volume 
comparison argument; the second uses the empirical method invented by Maurey [35| and 
refined by Carl p]. 



Proposition 2.3 (Covering Numbers) For u E (0,1], we have the following bound. 

N^s-^W-W^u) < minjj^y (1 + 2/u) 8 , n O a (iogn)/u=» J _ (2 Q) 

The Ci are positive universal constants. 
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Explicit values of the constants C\, C 2 can be found in [381 Lem. 8.3, eq. (8.14)]. We finish 
off the estimate for the first integral using Proposition 12.31 Splitting the integral at A, we 
have 



h < C 



< C 



m 
s 



/ \J s(\og(n/s) + log(l + 2/u)) du + log(n) / u 1 du 
Jo Jx 

Xy/s log(n/s) + Xy/s log(A _1 ) + log(ra) log(A~ 1 ) 



< C1 



s log 2 (s) log 2 (n) 



■m 



(2.10) 



where we have chosen A = s in the last step. 



2.4 The subexponential integral 

We can also bound the d\ metric in terms of the norm || • ||ob, which allows us to re-use 
the estimates for the covering numbers given in Proposition 12.31 to control the first integral 
in {221). To begin, 

^1 (""??/) — H-^sc,!/ D x y\\ || -^x, ;/ 1| \\Ff Xj y\\ 

where the matrix A xy is given by the expression 

A x , y := -F*(X*P n X - Y*P n Y)F 

m 

= ^-F*((X + Y)*P n (X — Y) + (X — Y)*P n (X + Y))F, (2.11) 
2m 

and D X)V denotes the diagonal of the matrix A xy . 

We bound the diagonal term first. Let f k be the /cth column of F, and note that H/fcHl = n - 
Owing to Lemma 12. 11 we have 

HD^II = ^mzx\fZ(X + Y)*Pn(X-Y)f k 

< -max\\P n (X + Y)f k \\ 2 - \\{X -Y)f k \\ 2 
m k 

< —max\\pQ\\-\\X + Y\\-\\X -Y\\-\\f k \\l = — \\x + • ||x - y\\ & 
m k m 

2s 1 . 

< 7=\\ x - v ob- 

m y s 

In the last inequality, we have used the fact that ||aJ + y||ofa < ||x + j/||i < 2yfs for x,y 6 T. 
For the off-diagonal term, we use Lemma 12. II to compute 

~ ~ ,* ~ .V 7T ~ ~ ,* ~ ~ 

\\A X J\ = -\\F*(X + Y)*P n (X -Y)F\\ < -\\(X + Y)*P n (X-Y)\\ 

m m 

<-\\{X + Y)\\-\\P n \\-\\(X-Y)\\ = -\\x + y\\ A -\\x-y\\ A 
m m 

2s 1 . 

< r\\ x ~ V co- 
rn \/s 
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In summary, 

4s 1 

di(x,y)< 7 =\\x-y\\ 6b . (2.12) 

m y/S 

The covering number estimates of Proposition 12.31 allow us to bound the subexponential 
integral. 

roo a „ fl 

' II oo J 



roc a fl 

h ■= / logN(T,di,u) du < — N(T,s~ l/ ' z \\ ■ \U,u) du 
Jo m Jo 

f r\ rl 

I (slog(n/s) + slog(l + 2/u)) du + log 2 (n) / u~ 2 du 
Jo Jx 



Cs 

< — 

m 

< — (Aslog(n/s) + Aslog(l + 2/A) + A~ 1 log 2 (n)) 
m 

< c ' 3/2 '°« 3/2 ("> . ( 2 . 13 ) 

m 

We have taken A = s _1//2 log 1 ^ 2 (n) in the last step. 



2.5 Denouement 

As we noted in (|2.4p . the restricted isometry constant 5 S is given by the supremum of 
the random process G x . To compute the expectation of this supremum, we simply apply 
Proposition [221 Select Xq = so that G XQ = 0. Introduce the estimate (|2.13|) for the subex- 
ponential integral and (|2.10j) for the subgaussian integral into Dudley's inequality (|2.8p . 

g 3/2 l og 3/2( n ) / glog 2 (s)log 2 (n) 

m V m 

This point completes the proof. 



E5 S = EsuplG^I < K 

£CST 



3 Proof of Theorem Q (Tail Bound) 

In this section, we develop a tail bound on the supremum of the process G x . We require the 
following result, which is Theorem 17 from [5]. Let J 7 be a collection ofiixn symmetric 
real matrices, and assume that Z(k, k) = for each Z £ T . We are concerned with the tail 
behavior of the real-valued random variable 



Y := sup ^2 e k eiZ(k,> 



kl=l 



Define two variance parameters: 



and 



U := sup \\Z\ 



V 2 := E sup 



J2e e Z(k, 



k=i 
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The parameter V 2 describes the variance of X near its mean, while the second parameter 
U is the scale on which large deviations occur. 

Proposition 3.1 (Tail Bound for Chaos) Under the preceding assumptions, 

P{Y>E[Y] + A} < ^(- 32V2 f srMX/3 ) (3-D 

for all A > 0. 



Recall from (|2.4p . (|2.6p . and (|2.7p that the restricted isometry constant can be written as 



5 S = suplGa,] = supV] e k e e Z x (k,£) 

x&T x<=T k ' e 

where the matrix Z x has the expression 



Z x = A x - diag(A ;c ) for A x = — 

m 

As a consequence, Theorem 13.11 applies to the random variable 5 S . 

To bound the first parameter U, we first apply the triangle inequality to obtain 

H^xll < ||Ab|| + l|diag(A a; )||. 
Emulating the arguments in Section \2A\ we can bound each of the two terms. 



U x < - Pq • x f A < -, 

m m 



Similarly, 



In total, U < 2s /m. 



di ag (A x )\\ = -max f* k X*P n Xf k 

m k 



< — . 

m 



To bound the other parameter V 2 , we use the following "vector version" of the Dudley 
inequality, which we prove in the Appendix. 

Proposition 3.2 Consider the vector-valued random process 

h x = Z x e for x G T. 

Recall the definition of the pseudo-metric 

d 2 {x,y) := \\Z x -Z y \\ F 
Fix a point xq G T . There exists a universal constant K > such that 

\ 1/2 r-OO 

Esup \\h x -h Xo Hi) <K/ y/N(T,<h,u) du. (3.2) 

x£T J JO 
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With xq = 0, the left-hand side of (|3.2p is precisely V. We have already studied the 
integral on the right-hand side of (|3.2p in Section 12.31 We import (|2.10p to reach 

V 2 < — log 2 (s) log 2 n. 
m 

We are prepared to complete the tail bound for 6 S . For A < 1, 

A 2 1 

< — mm 



32C(s/m)log 2 (s)log 2 (n) + (130/3)(s/m)A ~ C " " V (s/m) log 2 (s) log 2 (n) ' s/m 

< * . 

C'(s/m) log 2 (s) log 2 (n) 

Applying (|3.ip . we reach 

F{6 s >E[6,] + \} < <r x2 l c '°\ 
with a 2 = (s/m) log 2 (s) log 2 (n). 

A A Dudley-type inequality for chaos processes 

We provide a proof sketch for Proposition 12.21 Let {e' k } be a Rademacher sequence inde- 
pendent of {e k }- The decoupling method (see, for example, [TSJ Th. 3.1.2]) yields 

Esup\G XQ -G x \ = Esup \ 'y]e k ee(Z Xo (k,£) - Z x (k,£))\ 
336T xer M 

< 8Esup\Y^e k e' e (Z X0 (k,£) - Z x {k,£))\. 

Now we introduce two independent standard Gaussian sequences {git} and {g' k }- Applying 
the contraction principle |29[ eq. (4.8)] twice, first conditioned on and then on gL leads 
to 

Esup \G X0 - G x \ < 8*^-Esuv\J2e k g' e (Z X0 (k,£)-Z x (k,£))\ 
xer V 2 ^ 

< AirEmv\Y^g k g' t (Z Xa {k,£)-Z x {k,£))\. (A.l) 

Thus our task is to bound the expected supremum of a decoupled Gaussian chaos process. 
Using [HI Th. 1.2.7], we see that 

Esn V \G x -G X0 \ < C( 7 i(r,d 2 )+ 72 (r,di)), 

where -y a (T,d) is the 7 a -functional of the metric space (T,d); see |45} Def. 1.2.5]. It is 
known that 

roo 

7*(T,d) < C / Iog 1/a (JV(r,d,u)) du. 



o 



This is established carefully in [451 p. 13] for the case a = 2 and the general case is analogous. 
The statement of the theorem follows. 
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B A Dudley type inequality for vector-valued Rademacher 
processes 



In this section we give a sketch of the proof of Proposition 13.21 the vector- valued version of 
Dudley's inequality. It is a consequence of the following proposition. 

Proposition B.l Let A be an m x n matrix with columns a±, . . . , a n . For each u > 0, 

\ 

J^EjoJa > ||A|| F • u < 2e~ cu \ 
3=1 ' J 

where c is a universal constant. 

Proof It is easily seen that E || Xlj=i e i a jll2 = Y^j \\ a j\\2 = II-^IIf- The vec t° r version of 
Khintchine's inequality, given in [15| Th. 1.3.1] implies that, for p > 2, 

i/p i n y/2 

E II J2e jaj \\ p A < VP I E || E^-lll = VpUWf- 

This moment growth implies the tail estimate, see e.g. |38^ Proposition 6.5] 
P(|l5^ejOj-|| 2 > e 1/2 \\A\\ F ■ u) < e~ u \ u>V2, 

3 = 1 

which yields the conclusion. ■ 

An explicit value of c = 1/2 for the constant above can be found using non-commutative 
Khintchine inequalities [6"ll38j. 



With this proposition in place, we can prove Proposition 13.21 as follows. From Proposition 

{ED, 

f(\\h x - h y \\ 2 > \\Z X - Z y \\ F ■ u) < 2e _c " 2 for all x,y G T. 

This sets us into the position to follow the standard proof of Dudley's inequality for scalar- 
valued subgaussian processes; see [351 Theorem 6.23] or [T|[2^|H5]. One only has to replace 
the triangle inequality for the absolute value by the one for || • ||2 in C m . This finally yields 
the stated conclusion. 
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